Prioritizing tests of epistasis through hierarchical representation of genomic redundancies
نویسندگان
چکیده
Epistasis is defined as a statistical interaction between two or more genomic loci in terms of their association with a phenotype of interest. Epistatic loci that are identified using data from Genome-Wide Association Studies (GWAS) provide insights into the interplay among multiple genetic factors, with applications including assessment of susceptibility to complex diseases, decision making in precision medicine, and gaining insights into disease mechanisms. Since the number of genomic loci assayed by GWAS is extremely large (usually in the order of millions), identification of epistatic loci is a statistically difficult and computationally intensive problem. Even when only pairwise interactions are considered, the size of the search space ranges from hundreds of millions to billions of locus pairs. The large number of statistical tests performed also makes sufficient type one error correction imperative. Consequently, efficient algorithms are required to filter the tests that are performed and evaluate large GWAS data sets in a reasonable amount of computation time. It has been observed that many pairwise tests are redundant due to correlations in their genotype values across samples, known as linkage disequilibrium. However, algorithms that have been developed for efficient identification of epistatic loci do not systematically exploit linkage disequilibrium. Here, we propose a new algorithm for fast epistasis detection based on hierarchical representation of linkage disequilibrium (LinDen). We utilize redundancies in genotype patterns between neighboring loci to generate a hierarchical structure and execute a branch-and-bound search to prioritize loci testing based on approximations of a test statistic for pairs of locus groups. The hierarchical organization of tests performed by LinDen allows for efficient scaling based on the screened loci. We test LinDen comprehensively on three data sets obtained from the Wellcome Trust Case Control Consortium: type two diabetes, psoriasis, and hypertension. Our results show that, as compared other state-of-the-art tools for fast epistasis detection, LinDen drastically reduces the number of tests performed while discovering statistically significant locus pairs. LinDen is implemented in C++ and is available as open source at http://compbio. CASE edu/linden/.
منابع مشابه
Models of EFL Learners’ Vocabulary Development: Spreading Activation vs. Hierarchical Network Model
Semantic network approaches view organization or representation of internal lexicon in the form of either spreading or hierarchical system identified, respectively, as Spreading Activation Model (SAM) and Hi- erarchical Network Model (HNM). However, the validity of either model is amongst the intact issues in the literature which can be studied through basing the instruction compatible wi...
متن کاملTwo-Stage Genome-Wide Search for Epistasis with Implementation to Recombinant Inbred Lines (RIL) Populations
OBJECTIVE AND METHODS This paper proposes an inegrative two-stage genome-wide search for pairwise epistasis on expression quantitative trait loci (eQTL). The traits are clustered into multi-trait complexes that account for correlations between them that may result from common epistasis effects. The search is done by first screening for epistatic regions and then using dense markers within the i...
متن کاملOrganization of Gatekeeping and Mental Framework in the System of Representation and Hierarchical Relational Structures of the Modern Society
Critical discourse analysis as a type of social practice reveals how linguistic choices enable speakers to manipulate the realizations of agency and power in the representation of action.The present study examines the relationship between language and ideology and explores how such a relationship is represented in the analysis of spoken text and to show how declarative knowledge, beliefs, attit...
متن کاملThe Application of Multi Attribute Decision Methods (MADM) on prioritizing Iranian fisheries research projects
The ultimate goal of an agriculture research system is on-time, correct and clear response to the problems and expectations of agriculture household and stakeholders. In this respect, though, due to variation and frequency of the problems and expectations and as well as many limitations such as financial deficit, short time and shortage in work force and equipments etc, the system cannot be tho...
متن کاملThe Application of Multi Attribute Decision Methods (MADM) on prioritizing Iranian fisheries research projects
The ultimate goal of an agriculture research system is on-time, correct and clear response to the problems and expectations of agriculture household and stakeholders. In this respect, though, due to variation and frequency of the problems and expectations and as well as many limitations such as financial deficit, short time and shortage in work force and equipments etc, the system cannot be tho...
متن کامل